Search CORE

51 research outputs found

N-gram analysis of 970 microbial organisms reveals presence of biological language models

Author: A Campbell
A Poddar
A Tomovic
AL Demain
AL Demain
BR King
BY Cheng
C Woese
CD Manning
D Tauritz
DJ McFarlane
DT Pride
DW Hosmer
E Buehler
F Daeyaert
GM Pavlovic-Lazetic
Hatice Ulku Osmanbeyoglu
J Qi
JC Schmitt
JO McInerney
K Fukami-Kobayashi
K Lee
L Bahl
LG Rahme
M Ganapathiraju
M Ganapathiraju
M Ganapathiraju
Madhavi K Ganapathiraju
MW van Passel
NS Mitic
P Engel
P Meng
R Hershberg
S Karlin
S Yang
TD Heer
TS Rani
V Kešelj
VV Solovyev
WB Cavnar
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background It has been suggested previously that genome and proteome sequences show characteristics typical of natural-language texts such as "signature-style" word usage indicative of authors or topics, and that the algorithms originally developed for natural language processing may therefore be applied to genome sequences to draw biologically relevant conclusions. Following this approach of 'biological language modeling', statistical n-gram analysis has been applied for comparative analysis of whole proteome sequences of 44 organisms. It has been shown that a few particular amino acid n-grams are found in abundance in one organism but occurring very rarely in other organisms, thereby serving as genome signatures. At that time proteomes of only 44 organisms were available, thereby limiting the generalization of this hypothesis. Today nearly 1,000 genome sequences and corresponding translated sequences are available, making it feasible to test the existence of biological language models over the evolutionary tree. Results We studied whole proteome sequences of 970 microbial organisms using n-gram frequencies and cross-perplexity employing the Biological Language Modeling Toolkit and Patternix Revelio toolkit. Genus-specific signatures were observed even in a simple unigram distribution. By taking statistical n-gram model of one organism as reference and computing cross-perplexity of all other microbial proteomes with it, cross-perplexity was found to be predictive of branch distance of the phylogenetic tree. For example, a 4-gram model from proteome of <it>Shigellae flexneri 2a</it>, which belongs to the <it>Gammaproteobacteria </it>class showed a self-perplexity of 15.34 while the cross-perplexity of other organisms was in the range of 15.59 to 29.5 and was proportional to their branching distance in the evolutionary tree from <it>S. flexneri</it>. The organisms of this genus, which happen to be pathotypes of <it>E.coli</it>, also have the closest perplexity values with <it>E. coli.</it> Conclusion Whole proteome sequences of microbial organisms have been shown to contain particular n-gram sequences in abundance in one organism but occurring very rarely in other organisms, thereby serving as proteome signatures. Further it has also been shown that perplexity, a statistical measure of similarity of n-gram composition, can be used to predict evolutionary distance within a genus in the phylogenetic tree.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Bioinformatics research in the Asia Pacific: a 2007 update

Author: A Madhumalar
BC Kim
C Wang
CJO Baker
D Gilbert
DT Singh
GL Zhang
H Sugawara
H Zhao
KH Choo
L Kong
M Ganapathiraju
Michael Gribskov
N Yanamala
O Miotto
O Miotto
PD Yoo
Q Xu
R Ördög
RTH Tsai
S Dastmalchi
S Miyano
S Ranganathan
S Ranganathan
S Ranganathan
SH Chen
SH Nagaraj
Shoba Ranganathan
Tin Wee Tan
U Sangket
V Chelliah
WY Kim
YP Lim
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

We provide a 2007 update on the bioinformatics research in the Asia-Pacific from the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998. From 2002, APBioNet has organized the first International Conference on Bioinformatics (InCoB) bringing together scientists working in the field of bioinformatics in the region. This year, the InCoB2007 Conference was organized as the 6th annual conference of the Asia-Pacific Bioinformatics Network, on Aug. 27–30, 2007 at Hong Kong, following a series of successful events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea) and New Delhi (India). Besides a scientific meeting at Hong Kong, satellite events organized are a pre-conference training workshop at Hanoi, Vietnam and a post-conference workshop at Nansha, China. This Introduction provides a brief overview of the peer-reviewed manuscripts accepted for publication in this Supplement. We have organized the papers into thematic areas, highlighting the growing contribution of research excellence from this region, to global bioinformatics endeavours

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

Macquarie University ResearchOnline

ScholarBank@NUS

Estimating the Worldwide Extent of Illegal Fishing

Author: A Anganuzzi
A Birkun
A Falaye
A Vaisman
A Zuleta
AB Benavente-Villena
CH Ainsworth
CH Ainsworth
CH Ainsworth
CL Kuo
D Brack
D Kaufmann
D Pauly
D Tesfamichael
DA Varkey
David J. Agnew
DJ Agnew
DJ Agnew
DJ Agnew
DM Weidner
G Hønneland
G Matishov
G Morgan
G Pramod
Ganapathiraju Pramod
HM Lozano-Montes
J Brashares
J Putt
JG Butcher
JG Butcher
JG Lambsdorff
John Pearce
John R. Beddington
JR Beddington
K Bray
K Kelleher
KI Hariri
KR Patterson
L Joseph
M Baddyr
M Bailey
M Chimanovitch
M Esmark
M Gianni
M Kalentchenko
M Melnychuk
M Pe
MA Palma
N Willoughby
NI Pearse
P Bernal
P Flewweling
P Flewwelling
PT Rajan
R Long
R Watson
R Watson
Reg Watson
S Castillo
S Clarke
S Guénette
S Nurhakim
S Nurhakim
S Tudela
S Tudela
SA Taghavi
Stuart A. Sandin
T Morato
TJ Pitcher
TJ Pitcher
TJ Pitcher
Tom Peatman
Tony J. Pitcher
UR Sumaila
V Restrepo
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Illegal and unreported fishing contributes to overexploitation of fish stocks and is a hindrance to the recovery of fish populations and ecosystems. This study is the first to undertake a world-wide analysis of illegal and unreported fishing. Reviewing the situation in 54 countries and on the high seas, we estimate that lower and upper estimates of the total value of current illegal and unreported fishing losses worldwide are between

10 bn and

23.5 bn annually, representing between 11 and 26 million tonnes. Our data are of sufficient resolution to detect regional differences in the level and trend of illegal fishing over the last 20 years, and we can report a significant correlation between governance and the level of illegal fishing. Developing countries are most at risk from illegal fishing, with total estimated catches in West Africa being 40% higher than reported catches. Such levels of exploitation severely hamper the sustainable management of marine ecosystems. Although there have been some successes in reducing the level of illegal fishing in some areas, these developments are relatively recent and follow growing international focus on the problem. This paper provides the baseline against which successful action to curb illegal fishing can be judged

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis

Author: A Ben-Hur
A Floratos
AR Shah
B Qian
B Rost
B-J Webb-Robertson
Bin Liu
C Leslie
CG Nevill-Manning
CS Leslie
H Ogul
H Rangwala
H Saigo
I Rigoutsos
J Bellegarda
J Shawe-Taylor
K Karplus
L Holm
L Liao
Lei Lin
M Ganapathiraju
M Gribskov
Q Dong
Q Dong
Q Dong
Q Dong
Q Dong
Qiwen Dong
QJ Su
QW Dong
R Kuang
S Henikoff
SE Brenner
SE Dowd
SF Altschul
SF Altschul
T Damoulas
T Håndstad
T Jaakkola
T Lingner
TF Smith
TK Landauer
TL Bailey
VN Vapnik
WR Pearson
WS Noble
Xiaolong Wang
Xuan Wang
Y Hou
Y Hou
Y Yang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. Results In this paper, a novel building block of proteins called Top-<it>n</it>-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-<it>n</it>-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-<it>n</it>-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-<it>n</it>-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-<it>n</it>-grams and LSA gives significantly better results compared to related methods. Conclusion The method based on Top-<it>n</it>-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-<it>n</it>-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

Abstract Background A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 203 different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content. Results Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results. Conclusion The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Phonemes:Lexical access and beyond

Author: A Basso
A Bürki
A Caramazza
A Cutler
A Cutler
A Cutler
A Cutler
A El Aissati
A Ganapathiraju
A Hanulikova
A Hervais-Adelman
A Lahiri
A Lahiri
A Nevins
A Prince
A Pufahl
A Rosenberg
A Weber
AB Lord
AC Cohn
AE Ades
AF Healy
AG Samuel
AM Liberman
AM Liberman
AM Trude
AP Salverda
AP Salverda
AR Smith
B Bagemihl
B McMurray
B McMurray
B Tranel
B Vaux
BA Church
BE Dresher
C Fowler
C Read
CA Fowler
CJ Davis
CO Orgun
CP Browman
D Dahan
D Dahan
D Gil
D Huttenlocher
D Jones
D Lee
D Norris
D Norris
D Norris
D Poeppel
D Poeppel
D Poeppel
D Walker
DA Swinney
DB Pisoni
DB Pisoni
DF Kleinschmidt
DH Klatt
DH Klatt
DH Klatt
DH Whalen
DJ Foss
DJ Foss
DJ Foss
DL Schacter
DL Schacter
DL Schacter
DW Gow Jr
DW Massaro
DW Massaro
DW Massaro
E Baković
E Birney
E Raimy
E Reinisch
E Sapir
E Sievers
E Spinelli
EL Newport
F Guenther
G Hickok
G Hickok
G Hickok
G Yeni-Komshian
GC Oden
GE Peterson
GS Dell
H Mitterer
HB Savin
I Hanique
I Király
IB Yildiz
IC Ward
IG Mattingly
IY Liberman
J Baudouin de Courtenay
J Berko
J Bertoncini
J Grainger
J Liu
J Local
J Mehler
J Mielke
J Morais
J Morais
J Morais
J Morton
J Rubach
J Segui
J Sherzer
JA Barlow
JB Pierrehumbert
JB Pierrehumbert
JB Pierrehumbert
JB Pierrehumbert
JC Toscano
JE Cutting
JE Cutting
Jeffrey S. Bowers
JI Vousden
JL McClelland
JM Foley
JM McQueen
JM McQueen
JM Toro
JR Saffran
JR Saffran
JR Saffran
JS Bowers
JS Bowers
JS Bowers
JS Perkell
JW Bohland
K Johnson
K Lukatela
KE Chambers
KN Stevens
KN Stevens
KN Stevens
KN Stevens
KW Church
KW Church
L Goldstein
L Lisker
L Osterhout
L Osterhout
L Rabiner
L Stockall
LL Bonatti
M Ahissar
M Coath
M Coltheart
M Halle
M Keetels
M Reilly
M Studdert-Kennedy
M Wolmetz
MF Damian
MG Gaskell
MH Davis
MH Davis
N Chomsky
N Chomsky
N Dumay
N Kazanina
N Mesgarani
Nina Kazanina
O Fujimura
O Fujimura
P Graf
P Rubin
PA Luce
PK Kuhl
PW Jusczyk
R Drullman
R Frost
R Frost
R Jakobson
RA Hayes
RB Lea
RE Remez
RF Port
RF Port
RF Port
RJ Zatorre
RL Diehl
RL Diehl
RM Atchley
RM Warren
S Bolozky
S Boudelaa
S Boudelaa
S Coulson
S Davis
S Greenberg
S Heaney
S Shamma
S Shamma
S Shamma
S Shamma
S Shattuck-Hufnagel
S Shattuck-Hufnagel
S Wu
SA Schane
SD Goldinger
SE Blumstein
SG Guion
SM Sheffert
SR Anderson
SV David
T Cho
TB Klein
TJ Palmeri
U Frauenfelder
VA Fromkin
W Marslen-Wilson
WA Wickelgren
WA Wickelgren
WD Marslen-Wilson
WH Auden
William Idsardi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2018
Field of study

Crossref

Explore Bristol Research

Examining the use of Antibiotics in Pediatric Cases of Community Acquired Pneumonia at LVHN

Author: Dean Wesley, MD
Ganapathiraju Meghana
Prendergast Kristen M, MD
Publication venue: LVHN Scholarly Works
Publication date: 01/06/2023
Field of study

Lehigh Valley Health Network: LVHN Scholarly Works

Characterization of protein secondary structure

Author: Balakrishnan N.
Ganapathiraju M. K.
Klein-Seethraman J.
Reddy R.
Publication venue: IEEE
Publication date: 01/05/2004
Field of study

What do proteins look like? Proteins are composed of fundamental building blocks of chemical molecules called amino acids. When a protein is synthesized by the cells, initially it is just a string of amino acids. This string arranges itself in a process called protein folding into a complex three-dimensional structure capable of exerting the function of the specific protein. We briefly review the fundamental building blocks of proteins, their primary and secondary structure